Machine Learning algorithm unveils glutamatergic alterations in the post-mortem schizophrenia brain

Schizophrenia is a disorder of synaptic plasticity and aberrant connectivity in which a major dysfunction in glutamate synapse has been suggested. However, a multi-level approach tackling diverse clusters of interacting molecules of the glutamate signaling in schizophrenia is still lacking. We investigated in the post-mortem dorsolateral prefrontal cortex (DLPFC) and hippocampus of schizophrenia patients and non-psychiatric controls, the levels of neuroactive d- and l-amino acids (l-glutamate, d-serine, glycine, l-aspartate, d-aspartate) by HPLC. Moreover, by quantitative RT-PCR and western blotting we analyzed, respectively, the mRNA and protein levels of pre- and post-synaptic key molecules involved in the glutamatergic synapse functioning, including glutamate receptors (NMDA, AMPA, metabotropic), their interacting scaffolding proteins (PSD-95, Homer1b/c), plasma membrane and vesicular glutamate transporters (EAAT1, EAAT2, VGluT1, VGluT2), enzymes involved either in glutamate-dependent GABA neurotransmitter synthesis (GAD65 and 67), or in post-synaptic NMDA receptor-mediated signaling (CAMKIIα) and the pre-synaptic marker Synapsin-1. Univariable analyses revealed that none of the investigated molecules was differently represented in the post-mortem DLPFC and hippocampus of schizophrenia patients, compared with controls. Nonetheless, multivariable hypothesis-driven analyses revealed that the presence of schizophrenia was significantly affected by variations in neuroactive amino acid levels and glutamate-related synaptic elements. Furthermore, a Machine Learning hypothesis-free unveiled other discriminative clusters of molecules, one in the DLPFC and another in the hippocampus. Overall, while confirming a key role of glutamatergic synapse in the molecular pathophysiology of schizophrenia, we reported molecular signatures encompassing elements of the glutamate synapse able to discriminate patients with schizophrenia and normal individuals.


INTRODUCTION
Schizophrenia has been conceptualized as a disease of dysfunctional synaptic plasticity 1 and aberrant cortical-subcortical connectivity 2,3 , with a multigenic etiopathogenesis and a complex biological architecture, which likely involves different biological pathways 4,5 . Genome Wide Association Studies (GWASs) have confirmed that a large pool of genetic variants are associated with the disorder and that genetic risk identified by such variants converges onto a relatively small number of biologically meaningful trajectories or pathways 6 . Interestingly, about 30% of genomic variation associated with schizophrenia by GWASs is directly or indirectly related with the glutamatergic signaling in the central nervous system, confirming the hypothesis that alteration of the glutamate synapse may play a critical role in the pathophysiology of this disorder (glutamatergic hypothesis of schizophrenia) 5 . Consistently, alteration in the overall glutamatergic synapse composition, which includes glutamatergic NMDA, AMPA and metabotropic receptors (NMDARs, AMPARs, and mGluRs, respectively), along with scaffolding and adaptor proteins 7-10 , as well as membrane and vesicular transporters [11][12][13][14] may occur in the brain of schizophrenia patients. In line with such evidence, genomic studies have reported that glutamate-pathway-specific polygenic risk scores predict behavioral and neuroimaging endophenotypes of schizophrenia supported by dysfunctions of the hippocampus and prefrontal cortex (PFC), two brain regions of critical importance to the pathophysiology of schizophrenia 15,16 . In particular, it has been suggested that dysregulation of the PFC in schizophrenia is secondary to an early alteration of the glutamatergic neurotransmission in the hippocampus 17 . However, to what extent each element of glutamate neurotransmission contributes to the appearance of clinical phenotypes of schizophrenia still remains obscure. Similarly, it stands unclear whether variation of one single element, rather than the coordinated variation of levels of many elements of the glutamatergic synapse is critical to the onset of the disorder.
To shed new light on such cryptic areas of our understanding for schizophrenia, here we performed a multimodal research of post-mortem dorsolateral PFC (DLPFC) and hippocampus of schizophrenia patients and non-psychiatric individuals on which we used high-performance liquid chromatography (HPLC) analysis to measure levels of glutamate and related amino acids with neurotransmitter/neuromodulatory activity, including D-amino acids, whose role as endogenous NMDAR modulators is emerging significantly in schizophrenia pathophysiology 18,19 . Moreover, in the same brain areas, by quantitative RT-PCR (qRT-PCR) and western blotting analyses we investigated, respectively, mRNA and protein levels of pre-and post-synaptic key molecules, including glutamate receptors (NMDAR, AMPAR, metabotropic), their interacting scaffolding proteins (PSD-95, Homer1b/c), plasma membrane and vesicular glutamate transporters (EAAT1, EAAT2, VGluT1, VGluT2), enzymes involved either in glutamate-dependent GABA neurotransmitter synthesis (GAD65 and 67), or in postsynaptic NMDAR-signaling, such as CAMKIIα and the pre-synaptic marker Synapsin-1 20,21 . We used a multistep approach to analyze data: (1) a univariable statistical analysis to detect possible significant differences between patients and controls in the levels of molecular and neurochemical elements previously mentioned; (2) a hypothesis-driven multivariable statistical analysis, and (3) a hypothesis-free Machine Learning analysis. The latter allowed us to (a) detect pathways of molecules able to discriminate schizophrenia cases from controls on the basis of the joint distribution of their levels in the DLPFC and the hippocampus; (b) identify interactions among elements of the synapse that defined molecular signatures of schizophrenia cases and controls.

RESULTS
HPLC analysis of neuroactive D-and L-amino acids levels in the post-mortem DLPFC and hippocampus of schizophrenia and control subjects Compelling evidence supports the hypothesis that deficient glutamatergic activity contributes in schizophrenia etiology and pathophysiology 22 . Accordingly, deregulation of glutamate and neuroactive D-amino acids levels has been reported in schizophrenia brains 5,23,24 . Here, we measured by HPLC the levels of the amino acids L-Glu, L-Asp, D-Asp, D-Ser, Gly, known to stimulate and modulate the activity of NMDARs [25][26][27] , and their precursors, L-Gln, L-Asn, and L-Ser in the DLPFC and hippocampus of schizophrenia patients and non-psychiatric controls (n = 20/brain region/clinical condition). Before proceeding with statistical comparisons of amino acids levels between schizophrenia and controls, we assessed whether the two groups were imbalanced with respect to the following clinical variables (confounders): gender, age at deceased, post-mortem interval (PMI) and samples' pH. No statistically significant differences were found in gender (number of males (%): CTRL = 16 (80%), SCZ = 12 (60%), χ 2 = 1.071, df=1, p = 0.301 from Chi-Square test) and pH (median [IQR]: CTRL = 6.54 [6.49-6.63], SCZ = 6.50 [6.42-6.56 Fig. 1g, Table 2). On the other hand, no significant differences were found in D-Asp, L-Asp, L-Ser, L-Glu, and L-Gln levels, as well as in D-Asp/total Asp, D-Ser/total Ser and L-Gln/ L-Glu ratios ( Fig. 1c- Fig. 1q), while no alterations in other amino acids were detected ( Fig. 1n-p,r-x, Table 2).
However, after the correction of p values for multiple testing, no statistically significant differences were found in any of the molecules analyzed in both brain regions between schizophrenia and non-psychiatric subjects ( Fig. 1, Table 2).
As regards to proteins, we observed a significantly higher levels in EAAT1, and lower levels in GluN2A and EAAT2 in the DLPFC of schizophrenia patients, compared with non-psychiatric controls  Table  5). EAAT2 levels were significantly lower also in the hippocampus c-x Content of c, n D-aspartate, d, o L-aspartate, e, p D-aspartate/total aspartate ratio, f, q L-asparagine, g, r D-serine, h, s L-serine, i, t D-serine/ total serine ratio, j, u glycine, k, v L-glutamate, l, w L-glutamine levels and m, x L-glutamine/L-glutamate evaluated in the post-mortem c-m dorsolateral prefrontal cortex (DLPFC) and n-x hippocampus, compared between controls (CTRL) and patients with schizophrenia (SCZ). In each sample, all the amino acids were detected in a single run by HPLC and expressed as nmol/g of tissue, while the ratios are expressed as percentage (%). The number of examined samples is reported in Table 2, for each considered amino acid.       Table 5). However, also in this case, after the correction of p values for multiple testing, no statistically significant differences were found in any of the analyzed transcripts or proteins between Linear combinations of multiple molecules of the glutamatergic synaptic components are predictive of schizophrenia in the post-mortem DLPFC Here, we generated arbitrary (hypothesis-driven) multivariable logistic models, defined on the basis of the functional interaction among different neurochemical and molecular elements of the glutamatergic synapse and tested whether linear combination of these molecules could discriminate between control and schizophrenia group, in both the DLPFC and hippocampus (Table 3,  Supplementary Table 6).
As already shown, subject's age at deceased and PMI were strongly predictive of the presence of schizophrenia. When both included in a multivariable logistic regression (reference model), increasing age resulted associated with a lower disease probability (OR = 0.88, 95% CI: 0.80-0.94, p = 0.003) whereas increasing PMI was associated with a higher disease probability (OR = 1.22, 95% CI: 1.02-1.54, p = 0.048). Both covariates discriminated schizophrenia patients from controls with a very high predictive accuracy (AUC = 0.90, 95% CI: 0.80-0.98) and therefore these were necessarily accounted as strongest confounders for the analyses of multivariable logistic models.
Significant associations were found using molecules from the DLPFC only. Indeed, higher levels of both GluN1 and D-Ser levels resulted still associated to higher disease probability (both OR ≥ 1.00) and significantly outperformed the reference model, which included age and PMI only (χ 2 = 8.515, df = 2, p = 0.014 from deviance test). Moreover, the linear combination of L-Glu, mGluR2/ 3, mGluR5 and EAAT2 (χ 2 = 11.065, df = 4, p = 0.026 from deviance test), as well as GluA1 and PSD-95 (χ 2 = 7.945, df = 2, p = 0.019 from deviance test) significantly outperformed the reference model, evidencing their contribution for the improvement in statistical association.
Machine Learning analysis finds pathways of molecules of the glutamatergic synapse that are predictive of schizophrenia in the post-mortem DLPFC and hippocampus Results from iterative Random Forests (iRFs) at the last iteration are reported both in the DLPFC (Fig. 4) and hippocampus (Fig. 5). A graphical plot of the Brier Scores achieved by iRFs in the OOB data at different parameter values (i.e. during the "tuning phase") is reported in Supplementary Fig. 3 whereas results of a finer grid search for the optimal number of iterations and regularization factor, among iRFs with 100,000 trees, is reported in Supplementary Table 7. As for DLPFC, iRF achieved a relatively small prediction error (Brier Score = 0.186) and a very high discriminatory power (AUC = 0.80, 95% CI: 0.65-0.92). The molecules that mostly contributed to discriminate schizophrenia from control were: VGluT2, EAAT2, GAD67, D-Asp/total Asp, D-Ser, PSD-95, GRIA1 and GRM5 whereas the ones that barely contributed to the discrimination were: L-Gln/L-Glu ratio, D-Ser/total Ser, EAAT1, GRIN2A and Gly (Fig. 4a). The pathway of the most stable interactions (recurrently recovered in the trees of the forest and with a stability score > 0.20) were graphically represented in Fig.  4b. Prevalent interactions were found between D-Ser and D-Asp/ total Asp, between D-Asp/total Asp and GRIA1, between D-Ser and the following molecules: PSD-95, GRIA1, VGluT2, EAAT2, and GAD67. To understand whether the relationship between the levels of each single molecule and the probability of having schizophrenia is linear, monotonic or more complex, accumulated local effect (ALE) was estimated with respect to the most important variables only (Fig. 4c). Higher VGluT2 levels were associated to a linear decreasing in schizophrenia probability whereas a clearly non-linear relationship (sigmoid curves) was found with respect to EAAT2, GAD67, D-Asp/total Asp, and D-Ser. As for EAAT2, GAD67, D-Asp/total Asp, the probability of schizophrenia was higher with respect to their lower levels and then become lower with respect to their higher levels. In contrast, for D-Ser, the probability of schizophrenia was lower with respect to its lower levels and then become higher with respect to its higher levels. Moreover, to investigate in which "direction" the molecules interact with one another (Fig. 4b), i.e. locating those regions at which the disease more likely occurred, partial dependence plots (PDPs) were produced only for those features with top stable interactions (stability score > 0.60). As shown in Fig. 4d, subjects with D-Asp/total Asp ratio lower than 1.00 and with D-Ser levels greater than 200 nmol/g of tissue achieved about 80% probability of having schizophrenia whereas, in the opposite region, such probability was dramatically reduced about to 20%.
As for hippocampus, iRF achieved a smaller prediction error (Brier Score = 0.165) and a higher discriminatory power (AUC = 0.85, 95% CI: 0.71-0.95). The molecules that mostly contributed to discriminate schizophrenia from controls were EAAT2, CAMK2A, Synapsin-1, L-Asn, GRIA2, GRIA4, and SLC17A7 whereas the ones that barely contributed to the discrimination were: Gly, mGluR5, CaMKIIα, VGluT1, and D-Asp (Fig. 5a). The pathway of the most stable interactions was represented in Fig. 5b. The most recurrent interactions were found with respect to EAAT2 levels. As shown by ALE plots (Fig. 5c), higher EAAT2 levels (% of control) were associated to a relevant linear decrease in schizophrenia probability, higher L-Asn levels (nmol/g of tissue) were associated to a relevant linear increase in schizophrenia probability whereas a non-linear relationship was detected with respect to the rest of the molecules. Moreover, PDP (Fig. 5d) suggested that subjects with lower levels of EAAT2 in conjunction with lower levels of SLC17A7 or Synapsin-1 were more likely to achieve higher probability of schizophrenia.
Moreover, Classification And Regression Tree (CART) showed that in the DLPFC, subjects with D-Ser ≥ 185 nmol/g achieved 85% chance of having the schizophrenia whereas those with D-Ser < 185 nmol/g were only 21% more likely to have the disease ( Supplementary Fig. 4). In the hippocampus, subjects with CAMK2A expression < 0.79 and (at the same time) with GRIA2 expression < 0.64 achieved 95% chance of having the schizophrenia whereas those with CAMK2A expression ≥ 0.79 and EAAT2 ≥ 37 % were more likely to do not have the disease at all (probability of 0%). The discriminatory accuracy achieved by both CARTs was AUC = 0.73 (95% CI: 0.60-0.85) and AUC = 0.92 (95% CI: 0.83-0.99) for DLPFC and hippocampus data, respectively. Glutamatergic synapse-related protein expression in the post-mortem dorsolateral prefrontal cortex and hippocampus of control subjects and patients with schizophrenia. a, a′ Representative autoradiograms of immunoblots of the glutamatergic synapse-related proteins performed in the dorsolateral prefrontal cortex and hippocampus of non-psychiatric controls (CTRL) and patients with schizophrenia (SCZ). Quantification of b, b′ GluN1, c, c′ GluN2A, d, d′ GluN2B, e, e′ GluA1, f, f′ GluA2/3, g, g′ GluA4, h, h′ mGluR1, i, i′ mGluR2/3, j, j′ mGluR5, k, k') Homer1b/c, l, l′ PSD-95, m, m′ GAD65, n, n′ GAD67, o, o′ EAAT1, p′, p′ EAAT2, q, q′ VGluT1, r, r′ VGluT2, s, s′ CaMKIIα, t, t′ Thr286-P-CaMKIIα and u, u' Synapsin-1 protein levels in the a-u dorsolateral prefrontal cortex and a′-u′ hippocampus of control subjects and patients with schizophrenia. GAPDH was used to normalize for variations in loading and transfer. The number of examined samples is reported in Supplementary Table 5, for each considered protein. Raw blots are shown in the Supplementary Fig. 1 (dorsolateral prefrontal cortex) and Supplementary Fig. 2 (hippocampus).
A. De Rosa et al.

DISCUSSION
Despite the recognized role of the glutamate system in the molecular pathophysiology of schizophrenia 5,30 , studies in post-mortem brains analyzing multiple clusters of molecules related to the glutamate signaling are missing. In the present work, we exploited this strategy, measuring both mRNA and protein expression of fundamental molecules at the glutamatergic synapse, as well as neuroactive amino acids levels in the postmortem DLPFC and hippocampus of schizophrenia patients and non-psychiatric controls. Paradoxically, univariate analyses revealed that, after the adjustment for multiple comparisons, none of the molecules we investigated was significantly different between schizophrenia patients and controls in the post-mortem DLPFC and hippocampus. Nonetheless, when shifting the approach from a univariate perspective to multivariate hypothesis-driven and hypothesis-free strategies, we discovered that the odds of belonging to the schizophrenia instead of the control group were significantly affected by variations in amino acids and glutamate-related synaptic elements. The hypothesis-driven strategy revealed three molecular patterns including: (1) the GluN1 subunit of the NMDAR and its ligand, D-Ser, the major NMDAR co-agonist in the forebrain 31 ; (2) L-Glu, the metabotropic receptors, mGluR2/3, mGluR5, along with the glutamate transporter EAAT2; (3) the scaffolding protein PSD-95 and the AMPAR subunit GluA1. On the other hand, the hypothesis-free strategy evidenced two robust and stable molecule pathways, one in the DLPFC and one in the hippocampus, whose levels were correlated within stable statistical interactions and predictive of each individual classification as a schizophrenia patient or control.
In detail, the cluster including the GluN1 subunit of NMDARs, prevalently distributed at the post-synaptic portion of the synapse, and its respective ligand, D-Ser, is of interest to both the pathophysiology and the treatment of schizophrenia. First of all, a reduction of GluN1 in the post-mortem PFC of patients with schizophrenia has been reported in different cohorts 23,32,33 , and has been suggested to modify NMDAR stoichiometry, therefore being responsible for the endogenous NMDAR deficit reported in schizophrenia 32 . On the other hand, multiple lines of evidence indicate D-Ser as a major modulator of NMDAR function with potential therapeutic effects 31 and in vivo significant implication as an auditory 34 and cognitive enhancer 35,36 in schizophrenia. Moreover, different studies pointed to D-Ser as a potential biomarker [37][38][39] , given its reduced levels in the serum and CSF of schizophrenia patients, compared to controls [40][41][42] . However, other investigations and meta-analysis study reported no alterations in D-Ser levels in the blood and CSF 43,44 , as well as in the postmortem brain of schizophrenia patients 45 , as also found in the present work. Consistent with the relevance of this issue, other investigations call to clarify such controversial results.
By adopting the same approach as above, we also identified a cluster of post-synaptic proteins, namely the ionotropic AMPAR GluA1 subunit and PSD-95, which have been both implicated in schizophrenia 46,47 and are synergically involved in the postsynaptic glutamate signaling along with synaptic neuroplasticity rearrangements of relevance to schizophrenia pathophysiology 48,49 . Indeed, this cluster of molecules is highly representative of the molecular machinery responsible for the architecture and functional modulation of the post-synaptic density in schizophrenia patients 50 . Specifically, the PSD-95 is an integral part of the post-synaptic density and have attracted interest in schizophrenia pathophysiology based on GWASs 51 . Moreover, PSD-95 is involved in the targeting, clustering, and dynamic retention of AMPARs to post-synaptic densities 52 . Therefore, our results confirmed previous reports that changes in glutamate receptors may not be the only molecular event responsible for glutamate signaling perturbation in schizophrenia 5,53,54 since also alterations at the post-synaptic level downstream receptor activation may contribute to the emergence of schizophrenia pathophysiology.
Another key cluster of molecules discriminating schizophrenia patients and controls included L-Glu, mGluR2/3, mGluR5, along with EAAT2. Such a cluster captures a critical portion of molecular variation at both pre-and post-synaptic side of the glutamatergic synapse. Indeed, mGluR5 is mainly localized at post-synaptic level, where it is implicated in excitatory events mediating neural plasticity and cognitive processes 55 . Importantly, mGluR5 has been linked to schizophrenia pathophysiology 55,56 and regarded as a potential novel target for antipsychotic therapy with modulator agents 57,58 . On the other hand, mGluR2 and mGluR3 are found in various combinations of pre-synaptic, post-synaptic and glial localizations 59 . Moreover, GRM3 gene, encoding mGluR3, has been pinpointed as putative harbor for schizophrenia risk variants by structural and functional GWASs 60,61 and this association was confirmed by a comprehensive meta-analysis including 11,000 subjects. EAAT2 is expressed predominantly in astroglial cells and is regarded as the main glutamate transporter, responsible for the vast majority of glutamate clearance at the glutamate synapse level 62 . Interestingly, in agreement with our data that include mGluR2/3 and EAAT2 in a cluster discriminating schizophrenia and control subjects, previous studies identified reduced EAAT2 expression in the PFC of subjects with high-risk GRM3 haplotype associated with schizophrenia 63 , and highlighted multiple EAAT2 interactome-associated biological pathways alteration in the disorder 64 . Finally, in line with the strong tendency to reduction of EAAT2 in both the DLPFC and hippocampus of schizophrenia patients, compared with controls, other studies have previously revealed significant decrease in EAAT2 expression in the post-mortem DLPFC 65 and parahippocampal regions of schizophrenia subjects 66 .
Interestingly, when we used the Machine Learning hypothesisfree analysis, we identified in both DLPFC and hippocampus stable molecule pathways that discriminate schizophrenia patients from non-psychiatric controls, which could not be conceived using the hypothesis-driven approach. Indeed, the latter only allowed for assessing the association between the weighted linear combination of some molecule levels and the presence of the disease, although excluding the possibility to formulate any a priori assumption about the specific molecular patterns underlying such combination. Although possible associations between VGluT2, EAAT2, GAD67, D-Ser, and PSD-95 levels and the presence of the disease in the DLPFC were originally assessed in the hypothesisdriven approach, strongest interactions between D-Ser with D-Asp/ total Asp levels, as well as between GAD67 with VGluT2 levels, along with a marginal effect of GRIA1 and GRM5 mRNA Fig. 4 Results from iterative Random Forest detect interactions among glutamatergic synaptic components with predictive signatures of schizophrenia in the DLPFC. a Variable importance (VIMP), rescaled from 0% to 100% (relative VIMP) with respect to the maximum achieved value. Only variables (features) with VIMP > 0 are shown and ranked from the most (top) to the less (bottom) important. b Pathway of most stable interactions (stability score > 0.20) are reported in network graphs. c Accumulated Local Effect (ALE) was computed for each variable with VIMP > 0. ALE describes how the features influence the target (i.e. the predicted probability of having schizophrenia, estimated by iRF) on average. For instance, the ALE estimate of 0.10 at VGLuT2 = 25 means that when the VGluT2 has value 25, then the probability of having the schizophrenia is higher by 0.10 (about 10%) compared to the average probability which is defined at ALE = 0 (i.e. when VGluT2 = 75). The gray band is a confidence band for the regression line fitted in the estimated ALE points. d Partial Dependence Plot (PDP) was produced only for those features with top stable interactions (i.e. stability score > 0.60). PDPs show the marginal (total) effect that two features have on the predicted outcome. Colored zones locate those regions at which the disease more likely occurs (high-risk schizophrenia, yellow/green zone) and not occurs (low-risk schizophrenia, blue/violet zone). Individual observations (red: schizophrenia, blue: control) are plotted with respect to each feature combination. mRNA expressions are reported in logarithmic scale.
expressions, were found using only the hypothesis-free approach. Importantly, the Machine Learning algorithm also provided helpful insights into the molecular signatures of schizophrenia at the hippocampus level. Indeed, in such a brain region, surprising strong and stable pairwise interactions of the EAAT2 levels with SLC17A7 and GRIA2 transcripts, Synapsin-1 and CAMK2A were detected. Altogether, these results underline that different pattern of multiple interacting proteins both at glutamate pre-synaptic and post-synaptic level could account for discriminating patients from control subjects.
Our study has some weakness. First, our samples of postmortem brains from patients' group had significantly longer PMI compared with healthy individuals. Moreover, patients are on average younger than controls. However, this apparent discrepancy is in line with literature reporting a reduced life expectation in patients with schizophrenia 67-69 , compared with the general population. Based on these differences, we corrected our analyses for the potential confounding effect of PMI and age with the further inclusion of such variables in all statistical models we performed. Furthermore, one possible confounder of our results could be represented by the type and dosage of medication. With this regard, at the time of their exitus, the patients we analyzed were undergoing antipsychotic treatments with one or more of first-generation (Fluphenazine and Haloperidol) and secondgeneration antipsychotics (Risperidone, Olanzapine, Aripiprazole, Ziprasidone, Compazine, and Quetiapine) (Supplementary Table  3). Unfortunately, dosages of these antipsychotics were not available and, because of the fragmentary nature of the information we had with this respect, we could not correct results of our analyses for the effect of such medications. Nonetheless, it is worth noticing how evidence suggests that the overall profile of the antipsychotic treatments assumed by the sample we examined, with the exception of Quetiapine, was broadly characterized by comparable levels of D2 receptor blockade [70][71][72][73][74] .
Finally, the reported observations are related to the analysis of total homogenate samples. Therefore, we cannot discriminate the molecules on the basis of their synaptic localization within cell body, synaptosomal or extracellular fraction.
To our knowledge, this is the first work aiming to identify molecular signatures of schizophrenia by looking at multi levels variation of such a large number of elements implicated in the glutamate synapse. In conclusion, our results indicate that changes in the overall landscape of glutamate synapse more than alteration in single molecules underpin the pathophysiology of schizophrenia. This observation suggests, in turn, that future pharmacological strategies aiming to reduce symptoms of schizophrenia by targeting the glutamate system should be directed towards large interactomes operating within such a synapse, more than targeting one single molecule.

HPLC analysis
Post-mortem brain samples were homogenized in 1:10 (w/v) 0.2 M trichloroacetic acid. The samples were sonicated (3 cycles, 10 s each) and centrifuged at 13,000 × g for 20 min. Precipitated protein pellets were stored at −80°C for protein quantification 75 . Samples were then neutralized with 0.2 M NaOH and subjected to pre-column derivatization with o-phthaldialdehyde/N-acetyl-L-cysteine in 50% methanol. Diastereoisomer derivatives were resolved on a Simmetry C8 5-μm reversed-phase column (Waters, 4.6 × 250 mm) in isocratic conditions (0.1 M sodium acetate buffer, pH 6.2, 1% tetrahydrofuran, 1 ml/min flow rate) 76 Fig. 1b) and peak areas and then compared with those associated with external standards. D-Asp peak specificity was also evaluated by selective degradation catalyzed by a recombinant human Daspartate oxidase 77,78 . Human D-aspartate oxidase enzyme (12.5 μg) was added to the samples, incubated at 30°C for 3 h, and subsequently derivatized. Total protein content of homogenates was determined by Bradford assay method, after re-solubilization of the trichloroacetic acid precipitated protein pellets. The detected amino acids concentration was then normalized by the total protein content and expressed as nmol/mg protein. D-amino acid/total amino acid ratio was expressed as percentage (%).

RNA extraction and quantitative RT-PCR analysis
Total RNA was extracted from post-mortem tissues using RNeasy ® mini kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions (Querques et al., 2015). Total RNA was purified to eliminate potentially contaminating genomic DNA using recombinant DNase (Qiagen, Hilden, Germany). RNA integrity number (RIN) of samples was assessed using Agilent 2100 Bioanalyzer Expert (Santa Clara, California, USA) and Biorad Experion Automated electrophoresis Station (Hercules, CA) prior to cDNA synthesis using Transcriptor First Strand cDNA Synthesis kit (Roche Diagnostics, Mannheim, Germany). A total of 1 μg of total RNA of each sample was reverse transcribed with QuantiTect Reverse Transcription (Qiagen, Hilden, Germany) using oligo-dT and random primers according to the manufacturer's instructions. Quantitative RT-PCR with Real Time ready catalog Assays (Roche Diagnostics) and LightCycler ® 480 Probe Master (Roche Diagnostics) was performed on a Light Cycler 480 Real Time PCR thermocycler with 96-well format (Roche Diagnostics). All measurements from each subject were performed in duplicate. The following protocol was used: 10 s for initial denaturation at 95°C followed by 40 cycles consisting of 10 s at 94°C for denaturation, 10 s at 60°C for annealing, and 6 s for elongation at 72°C temperature 79 . The primers used for GRIN1, GRIN2A, GRIN2B, GRIA1, GRIA2, GRIA3, GRM1, GRM2, GRM3, GRM5, Homer1, DLG4, GAD1, GAD2, SLC1A3, SLC1A2, SLC1A7, SLC1A6, CAMK2A, SYN1 mRNA amplification are listed in Supplementary Table 1. mRNA expression levels were normalized to the mean of two housekeeping genes: β-actin (ACTB) and cyclophilin (PPIA). mRNA expression was calculated using the geometric mean of the two reference genes selected and the relative quantification method (2 −ΔΔCt ). Fig. 5 Results from iterative Random Forest detect interactions among glutamatergic synaptic components with predictive signatures of schizophrenia in the hippocampus. a Variable importance (VIMP), rescaled from 0% to 100% (relative VIMP) with respect to the maximum achieved value, is reported on panel (a). Only variables (features) with VIMP > 0 are shown and were ranked from the most (top) to the less (bottom) important. b Pathway of most stable interactions (i.e. with stability score > 0.20) are reported in network graphs. (c) Accumulated Local Effect (ALE) was computed for each variable with VIMP > 0. ALE describes how the features influence the target (i.e. the predicted probability of having schizophrenia, estimated by iRF) on average. For instance, the ALE estimate of −0.25 at EAAT2 = 150 means that when the EAAT2 has value 150, then the probability of having the schizophrenia is lower by 0.25 (about 25%) compared to the average probability which is defined at ALE = 0 (i.e. when EAAT2 = 50). The gray band is a confidence band for the regression line fitted in the estimated ALE points. d Partial Dependence Plot (PDP) was produced only for those features with top stable interactions (i.e. stability score > 0.60). PDPs show the marginal (total) effect that two features have on the predicted outcome. Colored zones locate those regions at which the disease more likely occurs (high-risk schizophrenia, yellow/green zone) and not occurs (low-risk schizophrenia, blue/violet zone). Individual observations (red: schizophrenia, blue: control) are plotted with respect to each feature combination. mRNA expressions are reported in logarithmic scale.

Western blotting
Frozen, powdered samples from post-mortem DLPFC and hippocampus tissues were sonicated in 1% SDS and boiled for 10 min. Aliquots (2 µl) of the homogenate were used for protein determination using a Bio-Rad Protein Assay kit. Equal amounts of total proteins (30 µg) for each sample were loaded on pre-cast 4-20% gradient gel (BioRad Laboratories). Proteins were separated by SDS-PAGE and transferred to PVDF membranes (GE Healthcare) using Trans Blot Turbo System. Membranes were immunoblotted overnight using the following primary antibodies: GluN1, GluN2A, GluN2B, GluA1, GluA2/3, GluA4, mGluR1, mGluR2/3, mGluR5, Homer1b/c, PSD-95, GAD65, GAD67, EAAT1, EAAT2, VGluT1, VGluT2, Synapsin-1, CaMKIIα, Thr-286-P-CaMKIIα (antibodies specimens are listed Supplementary Table 2. Blots were then incubated with α-rabbit or α-mouse horseradish peroxidase conjugated secondary antibodies. Immunoreactivity was detected by enhanced chemiluminescence (ECL) (GE-Healthcare) and quantified by Quantity One software (Bio-Rad). Optical density values were normalized to GAPDH for variations in loading and transfer. Normalized values were then averaged and used for statistical comparisons. All blots derive from the same experiment and were processed in parallel.

Statistical methods
Data are reported as medians, along with interquartile range (first-third quartiles-IQR), and as absolute and relative frequency (percentages) for continuous and categorical variables, respectively. The normality assumption was assessed by the Shapiro-Wilk test. For continuous variables with right-skewed distribution, statistical analyses were performed using their log-transformed values. Comparisons of clinical characteristics (age at deceased, gender, PMI, pH) between schizophrenia patients and controls were performed using (two-tailed) two-sample t-test or Chi-Square statistic with Yates's correction for continuous and categorical variables, respectively. Age and PMI-adjusted comparisons between schizophrenia patients and controls were performed by ANCOVA models and p-values were also corrected for multiple testing, following the Bonferroni method. For t-tests and ANCOVAs, t-values and F-values along with degrees of freedom were also provided, respectively. Furthermore, to assess whether a linear combination of multiple molecules of the synaptic components was predictive of the presence of schizophrenia, a multivariable logistic model, which included both the molecules as main effects and the strongest confounders (i.e., age, PMI) as covariates, was performed and compared to the one which included confounders only by the deviance test. Results were reported as odds ratio (OR), along with their 95% confidence interval (CI). Unknown patterns of multiple molecules of the synaptic components were detected by the Iterative Random Forest (iRF) algorithm 80 , using a complete dataset with imputed missing values. The imputation was performed by the Multivariate imputation by chained equations (MICE) algorithm 81 with 10 chains of multiple imputations and 50 iterations per chain, using a random forest of 10 trees per each iteration (see the paragraph: "Handling missing values" in the Supplemental Statistical Methods section of the Supplemental Information for further details). The iRF is an ensemble of machine learning (model-agnostic) method for classification and regression that operates by constructing a multitude of decision trees. iRF is a generalization of a Random Forest (RF) and is commonly used to train a feature-weighted ensemble of decision trees to detect stable and high-order interactions 80 . As well as in a RF, each decision tree in the iRF is built on a bootstrap sample from the original dataset. The portion of the bootstrap dataset not used for the building of each tree is called Out Of Bag (OOB) data and is employed to get both an unbiased estimate of the RF prediction error (i.e. the Brier Score) and an estimate of a "variable importance" (VIMP). The predicted individual probability of having the disease is computed as the average of all probabilities over all trees in the forest estimated in OOB data for that individual and the Brier Score is computed as the mean squared difference between such predicted probabilities and the actual outcomes. The Brier Score varies from 0 (i.e. RF is perfectly calibrated) to 1 (i.e. RF is perfectly miscalibrated). To address the between groups imbalances (i.e. adjusting the analysis by subjects' age and PMIs) in the iRFs, new individual weights were estimated following the Inverse Probability Weighting method 82 (as a first step) and then such Inverse Probability Weights (IPWs) were supplied to iRFs (as a second step). Because of this, observations with higher IPWs are selected more frequently into each bootstrap sample, which will be used to build each decision tree of the forest, with respect to those with lower IPWs (see the paragraph: "Handling imbalance data between patients and controls in the iRF algorithms" in the Supplemental Statistical Methods section of the Supplemental Information for further details). In the iRF algorithm, a RF will be iteratively performed K times. At the first iteration, a subsample of candidate variables (i.e. features) will be randomly selected at each split of a decision tree. On the basis of the variable importance and a regularization factor, new weights will be assigned to each variable so that, at the next iteration, variables with higher weights will be selected with higher probability than the others. Therefore, at the last iteration, the iRF will include regularized trees and decision rules extracted from such feature-weighted RF are mapped 83 . This mapping allows to identify prevalent interactions in the RF through a computationally efficient algorithm (i.e. generalized Random Intersection Trees-RIT-algorithm 80 ) that searches for high-order interactions in binary data. A bagging step eventually assesses the stability of recovered interactions with respect to the bootstrap perturbation of the data. The proportion of times (out of B bootstrap samples) an interaction appears as an output of the RIT defines a "stability score" (i.e. 0 = totally instable interaction, 1 = totally stable interaction). The following parameters must be set to enable the iRF training, some of them were fixed in advance whereas some others were determined after a "tuning phase": (1) the number of random forest iterations: from 1 to 10 iterations were evaluated during the tuning phase; (2) the number of the trees included into the random forest (within each iteration): from 50 to 100,000 trees were evaluated during the tuning phase; (3) the choice of the variable regularization factor, where possible fixed values were: 1.0 (no regularization), 0.9 (weak regularization), 0.8 (moderate regularization), <0.8 (strong regularization) and were evaluated during the tuning phase; (4) the number of randomly chosen features that possibly split at in each node of the tree: this parameter was fixed to seven features; (5) the number of outer-layer bootstrap samples: this parameter was fixed to 30; (6) the node splitting criterion: Gini impurity measure; (7) the minimal node size: it was fixed that the final leaves of each tree in the forest must include at least five subjects. The "tuning phase" consists in a grid search of the optimal parameters combination that minimize the Brier Score achieved by iRF in the OOB data (see the paragraph: "Sensitivity of iRFs algorithms to tuning parameters" in the Supplemental Statistical Methods section of the Supplemental Information for further details). Accumulated local effects (ALE) and partial dependence plots (PDP) were performed to better quantify changes in disease probabilities at different variable values and detect the direction of the "most stable" interactions found by the RIT, respectively. To define a single classification rule, a tree-growing algorithm that recursively splits data into subgroups (i.e. Classification And Regression Tree) was eventually performed and the choice of tree size was determined on the basis of 10-fold cross validation of the prediction error. The discriminatory ability of both models and machine-learning algorithms was assessed by the Area Under the ROC Curve (AUC), along with its 95% CI computed with 1000 stratified bootstrap replicates. Further details about statistical analysis can be found in Supplemental Information. A p-value < 0.05 was considered for statistical significance. Statistical analyses and plots were performed using R foundation for statistical computing, Vienna, Austria (version 4.04).

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.